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Three possibilities to speed up the Hybrid Monte Carlo algorithm are investigated. Changing the step-size adap- 
tively brings no practical gain. On the other hand, substantial improvements result from using an approximate 
Hamiltonian or a preconditioned action. 
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1. Introduction 

The Hybrid Monte Carlo (HMC) algorithm0 
is now the standard method to simulate dynam- 
ical fermions. However the cost of the algorithm 
is substantial. It is therefore desirable to improve 
the efficiency of the HMC algorithm. 

First, we examine the possibility of chang- 
ing the step size adaptively||,^]. Second, we 
implement the HMC algorithm with approxi- 
mate Hamiltonians. Lastly, we precondition the 
fermionic action used in the HMC algorithm. 

2. HMC with adaptive step size 

The conventional HMC algorithm is performed 
with a fixed step size. One expects however 
that the molecular dynamics (MD) trajectory will 
bounce off the effective energy barrier represented 
by the minimum of the fermionic determinant. As 
it does, the curvature increases and so does the 
integration error. This effect should be more pro- 
nounced for smaller quark masses. Therefore one 
might expect, in that regime, benefits from vary- 
ing the step size adaptively. 

The naive way to adapt the step size consists 
in keeping the local error constant at each in- 
tegration step. But care must be taken to pre- 
serve time-reversibility. Stoffer has constructed a 
time-reversible adaptive step size method which 
gave promising gains on the Kepler problem[Q. 
Here we implement this method for the HMC al- 
gorithm. 

Call T(At) a one-step integrator; T(At) : 
(p,U) — ► {jp 1 ,U'). It maps momenta and link 



variables (p,U) onto (p',U') 
symmetric error estimator, 



Now we define a 



E S ( P , U:At) = e(p, U : At) + e(p', U' : -At), (1) 

where e(p, U : At) is a local error at (p, U) when 
the system is integrated by T(At). If the inte- 
grator is reversible, eq.(|l|) is obviously symmetric 
under the exchange (p, U, At) < — ► (p' , V, — At). 
The adaptive step size At is then determined by 
solving a symmetric error equation, 



Es(p, U : At) = tolerance 



(2) 
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The tolerance should be kept constant during the 
MD simulation. The adaptive step size deter- 
mined by eq.(||) takes the same value at the re- 
flected point (—p',U'), Therefore an integrator 
with the adaptive step size determined by eq.(||) 
is reversible. Taking for T the standard leapfrog 
integrator, we tested the above method for several 
quark masses, volumes and couplings |J. We ob- 
served that the variance of the step size increases 
as K increases, as expected. However, it decreases 
as the volume increases. Thus the gain over HMC 
with fixed step size turns out to be very small. 
The overhead to determine the adaptive step size 
by solving eq.(||) exceeds this gain. 

3. HMC with approximate Hamiltonian 

The leapfrog integrator with step size At has 
0(At 3 ) errors, resulting in an energy violation 
AH, which reduces the Metropolis acceptance. 
Since such errors are present anyway, there is no 
need to keep for the molecular dynamics Hamil- 
tonian Hmd the exact Hamiltonian H. An ap- 
proximate Hamiltonian Hmd which introduces 



errors of similar magnitude can be substituted at 
cheaper cost. The cost of the HMC algorithm 
is expressed by the cumulated cost of fermionic 
force evaluations per accepted trajectory: 

Ct = -r^V (3) 

Acc x At w 

for trajectories of a certain fixed length, where 
Nd is the number of multiplications by the Dirac 
operator D at each step of size At. In the follow- 
ing, we consider two possibilities to decrease Nd 
and try to find the minimum cost Cj. 

3.1. Chebyshev polynomial approximation 
to the Hamiltonian 

Liischer proposed to approximate the inverse of 
the fermion matrix D by a Chebyshev polynomial 
P n (D) H of degree n with zeroes Zk ■ 



D~ 



P n (D) = c n Ul =1 (D - z k ) 



(4) 



This approximation allows the construction of a 
local update algorithm with n auxiliary bosonic 
fields @. Here we use the same approximation for 
the inverse of the fermion matrix which appears 
in the molecular dynamics Hamiltonian: 
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h-i. 



P n (D? P n (D)4> 



(5) 



This formulation does not require any linear 
solver. The fermionic force evaluation requires in- 
stead a predetermined Nd = 2n multiplications 
by D. As n is reduced however, the approxima- 
tion (3.1) increases the energy violation, which re- 
duces the Metropolis acceptance. Figjlja) shows 
the efficiency 2/Ct vs. At. We find that the op- 
timal step-size depends little on the degree n of 
the polynomial. For the optimal n, the work is 
reduced by about 4 over standard HMC. 

3.2. Low accuracy stopping criterion 

Usually, in the molecular dynamics steps, the 
linear system which gives the fermionic force is 
solved to high accuracy. The initial guess to the 
solution of this linear system can then be extrap- 
olated from previous results without measurably 
spoiling time-reversibility. In this way, the num- 
ber of iterations in the solver can be reduced by 
a factor ~ 2|@]. On the other hand, it is also pos- 
sible to reduce the accuracy in the solver, pro- 
vided that the solution is time-reversible. This is 



4 4 '(3=5.00 kJ=o'.15o' 



ml t * 
* *} I 

***** 



(3=5.00 k=0.150 



+ +++ + 




+ HMC 
(66) 



Figure 1. (a) Efficiency vs. step-size At for poly- 
nomials of various degree n. For the standard 
HMC, the number of iterations in the BiCGjs 
solver was taken as n. The stopping accuracy 
was set to ||r|| = 10~ 8 , where ||r|| is the resid- 
ual norm, (b) Efficiency vs. residual norm. The 
trajectory length is 1(0.5) for a 4 4 (8 4 ) lattice. 

achieved simply by forfeiting the benefits of time 
extrapolation, and taking always the same initial 
guess for the solver. While this second approach 
is well-known, its benefits have not been studied 
to a comparable extent. 

Here we systematically investigate the optimal 
accuracy, which minimizes the cost of HMC, by 
varying the stopping criterion in the solver. In 
this case, the actual form of the corresponding 
Hamiltonian Hmd can not be known. Fig.|](b) 
shows efficiency, acceptance and number of it- 
erations. The number of iterations linearly de- 
creases with the logarithm of the stopping crite- 
rion. On the other hand, the acceptance remains 
practically constant up to a certain stopping cri- 
terion and then rapidly decreases. We measure 
the efficiency by Acc/ 4^ of iterations. The figure 
clearly shows that a low accuracy stopping crite- 
rion (||r|| 2 /lko|| 2 ~ 10" 5 and 1(T 6 on 4 4 and 8 4 
lattices, respectively) is most efficient. However 
the gain over standard HMC (||r||/||r || = 10~ 8 ) 
appears to decrease as the volume increases. 



4. HMC with preconditioned action 

The use of preconditioners is well-known for the 
solution of linear systems, like the one which gives 
the fermionic force Ff at each step. On the other 
hand, little attention has been paid to precondi- 



tioning the fermionic action itself. One exception 
is even-odd preconditioning: 



S f = 4(DlD ee )- 1 c 
where 



(6) 



det(l-«M) = det(l- n 2 M eo M oe ) = det(D ee ).(7) 

This formulation is widely used to reduce mem- 
ory requirements and work per force evalua- 
tion. What has not been noticed is that the 
fermionic force from (Q) is smaller than in the 
non-preconditioned case (see Fig.|J), allowing for 
an increase in the step-size without loss of accep- 
tance. This benefit stems from the better con- 
ditioning of D ee (to order k 2 ) than D (to order 
k) . This preconditioning can be pushed to higher 
order with virtually no overhead. Consider the 
partitioning of D ee : 

D ee = l- (K 2 Me M e)+ ~ {k 2 M eo M oe ) - , (8) 

where +(-) stands for the lower(upper) triangular 
part. The lower (upper) triangular matrices L and 
U = 1 — (k 2 M eo M oe )± have determinant 1. The 
preconditioned matrix L~ 1 D ee U~ 1 = 1 + C(k 4 ) 
can then be substituted to D ee in the fermionic 
action: 



Sf = ^[(L^DeeU-yiL^DeeU- 



(9) 



The corresponding fermionic force is obtained as: 
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f 



fdD ee ,+ dU t dL r+ , 



with rj e = Dl~ 1 W <fi e ', x e = LL^rje- Since no 
inverse of L or U appears in this formulation, the 
overhead in the force calculation is negligible. As 
shown in Fig.|| the fermionic force is reduced by a 
factor ~ 2 as Km changes from (standard even- 
odd) to its optimum slightly below k. It becomes 
much smaller than the gauge force. A Sexton- 
Weingarten integration scheme then allows to 
take very large fermionic step-sizes. 

This ILU preconditioner can be implemented 
to higher order in n, with some programming 
headache. The same idea can also be used for 
staggered fermions. 




Figure 2. RMS force for various actions. 

We tested ILU preconditioning eq.(||) on a 
16 3 x 32 lattice, at parameters studied by the 
SESAM collaboration {fi = 5.6, n = 0.1575) @. 
We could increase the fermionic step-size from 

0. 01 (SESAM) to 0.05 while maintaining a 66% 
acceptance. A more extended, though still pre- 
liminary test was conducted on an 8 4 lattice at 
j3 = 5.6, k = 0.1585. These parameters were cho- 
sen in H to compare HMC with the multiboson 
method, very near criticality. ILU precondition- 
ing allowed us to increase the fermionic step-size 
from 0.075 to 0.12. At the same time, we used a 
low-accuracy stopping criterion ||r||/||?"o|| = 10 -3 
during the trajectory, reducing the number of ma- 
trix multiplications per step from ~ 2 x 159 to 
~ 2 x 66. The autocorrelation time of the pla- 
quette seems unaffected by preconditioning. 

If these preliminary indications are confirmed, 
the prospect is to improve the efficiency of HMC 
by a factor 0(5). 
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